Reading in The Flow Cytometer Files

The flow cytometer files are stored the Cytometer Data directory one level above the current directory, hence we go back one level and read the files from this folder. However, we have to analyse the metadata files first to determine the files that are good. This enables us to only read files that matter which will save considerable computing time and effort.

What Measurements are Good?

To determine if a measurement from the flow cytometer is good, the cells per microlitre \((cells/\mu L)\) should be between \(50\) and \(1000\). The dilution factor that will result \(cells/\mu L\) within this range is normally not known in advance, hence the flow cytometry experiment is performed at different diltuion factors (\(2 - 3\) dilution factors). Furthermore, the flow cytometer is configured to count at least \(5000\) gated events, it is also desired to ignore files that doesn’t meet this threshold. Technically if the \(cells/\mu L\) is below \(50\), the number of gated events will be lesser than \(5000\) but the same cannot be said of files with \(cells/\mu L\) above \(1000\). The above discussed concepts are already implemented in the goodfcs() function from the cyanoFilter package. Hence, we use this to determine which of the dilution levels are good.

#list all folders in the Cytometer Data folder
metafiles <- list.dirs("../Cytometer Data")
nms <- sapply(str_split(metafiles[2:length(metafiles)], "/"), "[[", 3)

#reading in all meta data file and determining their status
meta_files_data <- lapply(metafiles[2:length(metafiles)], function(x) {
  inside <- list.files(x, full.names = T)
  #all csv files without the word GroupStats
  csv.ext <- which(tools::file_ext(inside) == "csv" & 
                    !str_detect(inside, "GroupStats"))
  csv.toberead <- inside[csv.ext]
  
  if(length(csv.ext) == 1) {
    
   metadata <- read.csv(csv.toberead, skip = 7,
                         stringsAsFactors = F, 
                         check.names = T)
   metadata <- metadata[, 1:65]
   #clean up the Cells.muL column
   names(metadata)[which(str_detect(names(metadata), "Cells."))] <- "CellspML"
   first_names <- sapply(str_split(metadata$Sample.ID, "_"), "[[", 1)
   metadata$Strain <- str_extract(first_names, "[4-5]")
   metadata$Status <- goodfcs(metadata, mxd_cellpML = 1000, mnd_cellpML = 50)
   id_names <- lapply(str_split(metadata$Sample.ID, "_"), function(x) {
        nms <- ifelse(length(x) < 3, paste(x[1], x[2], sep = "_"), 
               paste(x[1], x[2], x[3], sep = "_"))
    })
    metadata$Sample.ID2 <- unlist(id_names)
    metadata$File <- 1
    
  } else if (length(csv.ext) > 1) {
    
    metadatas <- lapply(1:length(csv.toberead), function(i) {
      ddata <- read.csv(csv.toberead[i], skip = 7, stringsAsFactors = F, check.names = T)
      ddata <- ddata[, 1:65]
      ddata$File <- i
      return(ddata)
    })
    metadata <- do.call(rbind.data.frame, metadatas)
    names(metadata)[which(str_detect(names(metadata), "Cells."))] <- "CellspML"
    first_names <- sapply(str_split(metadata$Sample.ID, "_"), "[[", 1)
    metadata$Strain <- str_extract(first_names, "[4-5]")
    metadata$Status <- goodfcs(metadata, mxd_cellpML = 1000, mnd_cellpML = 50)
    id_names <- lapply(str_split(metadata$Sample.ID, "_"), function(x) {
    nms <- ifelse(length(x) < 3, paste(x[1], x[2], sep = "_"), 
           paste(x[1], x[2], x[3], sep = "_"))
    })
    metadata$Sample.ID2 <- unlist(id_names)
    
  }
  return(metadata)
})

names(meta_files_data) <- nms

The block of code above goes into all sub-directory in the Cytometer Data, read all csv files that are not associated with GroupStats (note that the csv files associated with the instrument settings and GroupStats are saved in each directory with an **_GroupStats** for the GroupStats csv files), clean untidy variable names and determine their status (good or bad). The Status and File (file number) are added as an extra column to the csv files. A file number \(1\) implies the data associated to that sample is contained in the first fcs file while \(2\) implies it is in the second. Below is what a sample of the metafiles look like after the preprocessing above.

meta_files_data[[1]]

Files Retained

Since all samples were examined at different dilution levels, it implies that there is the possibility of having a sample being good at different dilution levels, therefore there is need for a choice to be made. We will discuss two strategies for making a choice of files to retain based on \(cells/\mu L\). The first choice is going for the dilution factor with the lowest \(cells/\mu L\) because this will ensure you have maximal number of events. This will be a good strategy for experiments with small number of particles. On the other hand, one could also decide to go for the dilution factor with the highest \(cells/\mu L\).

meta_files_data <- lapply(meta_files_data, function(y) {
  
  broken <- y %>% dplyr::group_by(Sample.ID2) %>% tidyr::nest()
  brokenR <- broken %>% dplyr::mutate(Retains = purrr::map(broken$data, function(.x) {
      rt <- .x %>% dplyr::mutate(Retain = retain(.x, make_decision = "mini"))
      #cbind(.x, Retain = rt)
    })
  ) 
  return(tidyr::unnest(brokenR))
})

The code above groups the csv files by sample IDS then runs the retain() function on each group. This adds a Retain column to the original data table. This column gives a decision on the files to be retained. Below is an example of how the csv files look like after the pre-processing.

meta_files_data[[1]]

Finally The Flow Cytometer Files

We are finally ready to read in the flow cytometer files. We use the same idea as with the csv files. First, we filter out all rows that are to be retained in the associated csv files. Then we list all files with the .fcs extension and read only the fcs files that are to be retained using both the File and Sample.Number columns. The File column allows us read the appropriate fcs file while the Sample.Number column allows us extract the precisely needed data from the fcs file.

fcs_files_data <- map2(.x = metafiles[2:length(metafiles)], .y = meta_files_data, 
      function(.x, .y) {
          Retained <- .y %>% dplyr::filter(Retain == "Retain")
        
          inside <- list.files(.x, full.names = T)
          fcs.ext <- which(tools::file_ext(inside) == "fcs" )
          fcs.toberead <- inside[fcs.ext]
          
          fcs.files <- lapply(1:nrow(Retained), function(i) {
                          fcs_data <- read.FCS(fcs.toberead[Retained$File1[i]], 
                                          alter.names = TRUE, 
                                          transformation = FALSE, 
                                        emptyValue = FALSE, 
                                        dataset = Retained$Sample.Number[i])
    })
    #keys <- sapply(fcs.files, keyword, "GTI$SAMPLEID")
    names(fcs.files) <- Retained$Sample.ID2
    as(fcs.files, "flowSet")
  
})

names(fcs_files_data) <- nms
pnames <- pData(phenoData(fcs_files_data$`02.04.2019`))

Filtering out Wrong Measurements and Transformation

#getting the total number of particles counted by the flow cytometer
Full_Particle_Count <- lapply(fcs_files_data, function(x) {
  fsApply(x ,nrow)
})

#removing NAs in the expression matrix
NonNas <- lapply(fcs_files_data, function (x) {
  fsApply(x, function (y) {
    nona(y)
  })
})

### ArcSinh Transformation
#arcsin_transformedSet <- lapply(NonNas, function(x) {
#  fsApply(x, function(y, toTrans) {
#    flowTrans(y, "mclMultivArcSinh", dims = toTrans, n2f = F, parameters.only = F)$result
#  }, toTrans = c("RED.R.HLin", "FSC.HLin", "RED.B.HLin", "YEL.B.HLin"))
#})

#filtering out the positive values. This is only important if 
#one plans to use log-transformation
NonNegatives <- lapply(NonNas, function (x) {
   fsApply(x, function (y) {
     noneg(y)
   })
})

### log Transformation
log_transformedSet <- lapply(NonNegatives, function(x, ntt) {
   fsApply(x, function (y, ntt2) {
     lnTrans(y, notToTransform = ntt2)
   },ntt2 = ntt)
}, ntt = c("SSC.W", "TIME"))
pair_plot(NonNas$`02.04.2019`[[1]])
Before Transformation

Before Transformation

pair_plot(log_transformedSet$`02.04.2019`[[1]])
After Transformation

After Transformation

Margin Events

logtrans_margin <- lapply(log_transformedSet, function(x) {
  dlist <- fsApply(x, function(y, ch, typ){
    cellmargin(y, Channel = ch, type = "estimate")$reducedflowframe
  }, ch = "SSC.W")
})

Population Identification

 #lapply(log_transformedSet, function(x) { 
#    fsApply(x, function(y) {
#      debris_nc(y, p1 = "RED.B.HLin", "YEL.B.HLin")
#    })
#  
#  })

Density and Abundance Computation